Wikidata Vandalism Detection - The Loganberry Vandalism Detector at WSDM Cup 2017

نویسندگان

  • Qi Zhu
  • Hongwei Ng
  • Liyuan Liu
  • Ziwei Ji
  • Bingjie Jiang
  • Jiaming Shen
  • Huan Gui
چکیده

Wikidata is the new, large-scale knowledge base of the Wikimedia Foundation. As it can be edited by anyone, entries frequently get vandalized, leading to the possibility that it might spread of falsified information if such posts are not detected. The WSDM 2017 Wiki Vandalism Detection Challenge requires us to solve this problem by computing a vandalism score denoting the likelihood that a revision corresponds to an act of vandalism and performance is measured using the ROC-AUC obtained on a held-out test set. This paper provides the details of our submission that obtained an ROC-AUC score of 0.91976 in the final evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble Models for Detecting Wikidata Vandalism with Stacking - Team Honeyberry Vandalism Detector at WSDM Cup 2017

The WSDM Cup 2017 is a binary classification task for classifying Wikidata revisions into vandalism and non-vandalism. This paper describes our method using some machine learning techniques such as under-sampling, feature selection, stacking and ensembles of models. We confirm the validity of each technique by calculating AUC-ROC of models using such techniques and not using them. Additionally,...

متن کامل

Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017

We report on the Wikidata vandalism detection task at the WSDM Cup 2017. The task received five submissions for which this paper describes their evaluation and a comparison to state of the art baselines. Unlike previous work, we recast Wikidata vandalism detection as an online learning problem, requiring participant software to predict vandalism in near real-time. The best-performing approach a...

متن کامل

Large-Scale Vandalism Detection with Linear Classifiers - The Conkerberry Vandalism Detector at WSDM Cup 2017

Nowadays many artificial intelligence systems rely on knowledge bases for enriching the information they process. Such Knowledge Bases are usually difficult to obtain and therefore they are crowdsourced: they are available for everyone on the internet to suggest edits and add new information. Unfortunately, they are sometimes targeted by vandals who put inaccurate or offensive information there...

متن کامل

A Production Oriented Approach for Vandalism Detection in Wikidata - The Buffaloberry Vandalism Detector at WSDM Cup 2017

Wikidata is a free and open knowledge base from the Wikimedia Foundation, that not only acts as a central storage of structured data for other projects of the organization, but also for a growing array of information systems, including search engines. Like Wikipedia, Wikidata’s content can be created and edited by anyone; which is the main source of its strength, but also allows for malicious u...

متن کامل

Building Automated Vandalism Detection Tools for Wikidata

Wikidata, like Wikipedia, is a knowledge base that anyone can edit. This open collaboration model is powerful in that it reduces barriers to participation and allows a large number of people to contribute. However, it exposes the knowledge base to the risk of vandalism and low-quality contributions. In this work, we build on past work detecting vandalism in Wikipedia to detect vandalism in Wiki...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1712.06922  شماره 

صفحات  -

تاریخ انتشار 2017